How do you know whether the objects are randomly dispersed within an area?
In this sharing, I will be exploring point pattern analysis.
This is one of the spatial topics I didn’t manage to find out more about when I was pursuing my master degree.

Photo by Timo Wielink on Unsplash
Before jumping right into the technique, let’s take a look what is spatial analysis.
Spatial analysis is a type of geographical analysis which seeks to explain patterns of human behavior and its spatial expression in terms of mathematics and geometry, that is, locational analysis (Dartmouth College Library).
Point pattern analysis is one of the fundamental concepts for spatial analysis.
Spatial Point Pattern Analysis is the evaluation of the pattern, or distribution, of a set of points on a surface (Kam).
In other words, this technique studies how the items are distributed within the study area.
Point pattern analysis can be used in many areas, including (Bivand, Pebesma, and Gómez-Rubio 2013):
Ecology (eg. How are the different trees distributed? Are there are any competitions between the trees?)
Epidemiology (eg. Is the disease clustered within the area?)
Of course, similar could be applied in the insurance context as well.
For example, one could study whether there are any underlying geographical patterns within the location on where claims occur.
Complete spatial randomness (CSR) studies whether the events are distributed independently at random and uniformly over the study area (Bivand, Pebesma, and Gómez-Rubio 2013).
In general, we can classify the distributions into three types (National Conservation Training Center 2010):
| Types of Distributions | Descriptions |
|---|---|
| Random | Any point is equally likely to occur at any location and the position of any point is not affected by the position of any other point |
| Uniform | Every point is as far from all of its neighbors as possible |
| Clustered | Many points are concentrated close together, and large areas that contain very few, if any, points |
Usually, clustered pattern happens when there is an attraction between the points, whilst the regular patterns occur when there is inhibition among the points (Bivand, Pebesma, and Gómez-Rubio 2013).
There are other statistical tests (eg. F function, K function, L function, etc) that could help us to test whether the objects are randomly distributed.
To keep the post short and sweet, I will only explore the G function in this post.
G function measures the distribution of the distances from an arbitrary event to its nearest event (Bivand, Pebesma, and Gómez-Rubio 2013).
G function is defined as following:
\(\hat{G}(r) = \frac {\# \{ d_i : d_i\le r, \forall i \}}{n}\)
where:
the numerator is the number of elements in the set of distances that are lower than or equal to d
d is \(min_j \{ d_{ij}^{*}, \forall j \ne i \}\)
n is the total number of points
Before jumping straight into the geospatial technique, let’s understand the common data files (i.e. shapefiles) used for the geospatial purposes.
A shapefile is a simple, nontopological format for storing the geometric location and attribute information of geographic features (esri a).
Usually the shapefiles will contain at least .shp, .shx, .dbf, and .prj (esri b).
Instead of our usual import function (eg. read_csv, read_excel etc), we will use the function that is specially designed to handle shapefiles.
I will be using Singapore Bus Stop Location dataset for this demonstration.

The dataset can be downloaded from LIA Data Mall.
To download the data files, below are the steps to download the data from the Data Mall:
Go to the ‘Static Datasets’ section
Then, choose ‘Road Infrastructure’
Filter the search by entering ‘Bus Stop Location’ in the search bar

I will also download the Singapore planning zone map from Data.gov.sg.
Okay, let’s set up the environment for the demonstration later.
packages = c('sf', 'maptools', 'spatstat', 'tmap')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
These are the documentation pages for sf, tmap, maptools, and spatstat.
First, I will import the data into the environment.
As the file is a shapefile, I will use st_read function to read in the shapefiles.
df <- st_read(dsn = "data/BusStopLocation", layer = "BusStop")
Reading layer `BusStop' from data source `C:\Users\Jasper Lok\Documents\2_Data Science\my-blog\_posts\2022-02-12-point-pattern-analysis\data\BusStopLocation' using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21
Over here, I have indicated the folder name under dsn argument and file names under layer argument.
I will also import the planning zone info into the environment for later purposes.
mpsz <- st_read(dsn = "data/SG Map", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\Jasper Lok\Documents\2_Data Science\my-blog\_posts\2022-02-12-point-pattern-analysis\data\SG Map' using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
One common check before the analysis is to ensure all the data is using the same projection system.
To check this, we can use st_crs function to extract the information of the geospatial data.
st_crs(df)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
The geodetic CRS from this geospatial data is SVY21 and ellipsoid is WGS 84.
Similarly, I will check the information of the mpsz spatial objects to ensure they are projecting on the same projection system as the spatial objects of the bus stop.
st_crs(mpsz)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
I use tmap package as this package allows us in creating interactive maps.
To use tmap, we will start with tmap_mode function. Since I want to plot an interactive graph, I will choose ‘view’ option.
Then I will indicate the shape object in the tmap_shape function. After that, we will need to indicate how we would like the plotting function to draw the symbols.
tmap_mode('view')
tm_shape(df) +
tm_dots()
One cool thing about this package is that the API of this package is based on the grammar of graphics. Hence, the syntax is pretty similar to ggplot package.
Another fantastic part of this package is it also allows us to choose leaftlet layer to change the maps.
To do so, we can hover over the “stacking” button below the “minus” button in the graph. The list will pop up as shown below.

Alternatively, we can indicate the interested layer through tm_basemap function as shown below. This will provide users with an alternative method to choose from a wider range of layers.
tmap_mode('view')
tm_basemap("Esri.DeLorme") +
tm_shape(df) +
tm_dots()
You can find other different layers in the following link.
From the map, there are 5 bus stops outside of Singapore! These are valid points since there are buses going back and forth to Johor Bahru every day.
Next, we will attempt to find out whether the bus stops are randomly distributed by running a hypothesis test.
\(H_{0}:\) The bus stops in Singapore is randomly distributed
\(H_{\alpha}:\) The bus stops in Singapore is not randomly distributed
To run the Monte Carlo test, we will first need to convert the spatial object into a point pattern object.
The simulation function, envelope function can only accept point pattern or a fitted point process model as stated in the documentation page.
First, I will convert the simple feature data frame into spatial points by using as_Spatial function.
busstop_sp <- as_Spatial(df)
Then, I will convert the spatial points to spatial points in spastat package. This is to allow us to perform the necessary test later.
maptools package will provide some functions to convert between ppp objects representing two-dimensional points patterns between spatstat package and SpatialPoints classes
busstop_ppp <- as(busstop_sp, "ppp")
Once the conversion is done, we can pass the spatial object into envelope function.
envelope function will randomly simulate many point patterns so that summary function is computed for all of them (Bivand, Pebesma, and Gómez-Rubio 2013).
G_sg_csr <- envelope(busstop_ppp,
Gest,
nsim = 1000)
Generating 1000 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990......... 1000.
Done.
I will also indicate the nsim should be 1,000, so that the function will simulate over 1,000 times.
Following is how to interpret the graph:
If the line for observations (i.e. the solid line) is outside and above the grey region, this indicates that the points are exhibiting a clustered pattern
If the line is outside and below the grey region, this indicates the points are exhibiting a regular pattern (i.e. the reverse of clustered pattern)
If the line falls within the grey region, there is statistical evidence that the points are randomly distributed
plot(G_sg_csr)

As shown above, we can see the solid line lies above the grey area. Hence we reject the null hypothesis and conclude that the bus points exhibits clustered pattern.
Instead of performing a statistical test for the entire area, we could segment out the area we are interested in and perform the necessary statistical test.
First, I will list out the planning areas within the Singapore map.
[1] "ANG MO KIO" "BEDOK"
[3] "BISHAN" "BOON LAY"
[5] "BUKIT BATOK" "BUKIT MERAH"
[7] "BUKIT PANJANG" "BUKIT TIMAH"
[9] "CENTRAL WATER CATCHMENT" "CHANGI"
[11] "CHANGI BAY" "CHOA CHU KANG"
[13] "CLEMENTI" "DOWNTOWN CORE"
[15] "GEYLANG" "HOUGANG"
[17] "JURONG EAST" "JURONG WEST"
[19] "KALLANG" "LIM CHU KANG"
[21] "MANDAI" "MARINA EAST"
[23] "MARINA SOUTH" "MARINE PARADE"
[25] "MUSEUM" "NEWTON"
[27] "NORTH-EASTERN ISLANDS" "NOVENA"
[29] "ORCHARD" "OUTRAM"
[31] "PASIR RIS" "PAYA LEBAR"
[33] "PIONEER" "PUNGGOL"
[35] "QUEENSTOWN" "RIVER VALLEY"
[37] "ROCHOR" "SELETAR"
[39] "SEMBAWANG" "SENGKANG"
[41] "SERANGOON" "SIMPANG"
[43] "SINGAPORE RIVER" "SOUTHERN ISLANDS"
[45] "STRAITS VIEW" "SUNGEI KADUT"
[47] "TAMPINES" "TANGLIN"
[49] "TENGAH" "TOA PAYOH"
[51] "TUAS" "WESTERN ISLANDS"
[53] "WESTERN WATER CATCHMENT" "WOODLANDS"
[55] "YISHUN"
Next, I will segment out the area of interest. In this example, I will segment “DOWNTOWN CORE” out from the planning zone as shown following.
downtown <- mpsz %>%
filter(PLN_AREA_N == "DOWNTOWN CORE")
Similarly, we can pass the spatial object to tmap function to see the selected area on the map.
tmap_mode("view")
tm_shape(downtown) +
tm_polygons()
We will then convert the spatial object into spatial polygons as shown below.
Next, we need to convert the object into an owin object.
Owin is defined as the “observation window” of a point pattern (R documentation).
downtown_owin <- as(as_Spatial(downtown), "owin")
If we call the owin object, it will tell us the polygon boundary of the selected area.
downtown_owin
window: polygonal boundary
enclosing rectangle: [28896.262, 31528.047] x [27914.19, 31714.21]
units
Once that is done, we could “cut” the observation location as shown below.
busstop_ppp_downtown <- busstop_ppp[downtown_owin]
Finally, we will pass the point pattern object into the G function to perform the necessary statistical test.
Below is the hypothesis test:
\(H_{0}:\) The bus stops in Downtown core is randomly distributed
\(H_{\alpha}:\) The bus stops in Downtown core is not randomly distributed
G_downtown_csr <- envelope(busstop_ppp_downtown,
Gest,
correction = "best",
nsim = 1000)
Generating 1000 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990......... 1000.
Done.
Lastly, I will plot out the simulated result.
plot(G_downtown_csr)

Interesting!
The result is showing us a different outcome when we zoom into one of the planning areas.
The bus stops in the selected area are neither clustered nor equally distributed.
This makes sense since it’s very unlikely the location of bus stops are concentrated in one area or have equal distance away from each other.
However, when we look at the bus stop locations for the entire Singapore, the majority of the bus stops will be located at the residency areas or where there are more activities.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Till next time, happy learning!

Photo by shawnanggg on Unsplash